code model
77b5aaf2826c95c98e5eb4ab830073de-Supplemental-Conference.pdf
Asystem of regions (also referred to as anetwork) can comprise multiple disjoint regions that exhibit shared activity patterns across a range of tasks. The auditory system is located in the superior temporal region of the brain. This region uniquely encodespitch, speech, and music, butisnot involvedinhigh-levellanguage comprehensionandproduction[Norman-Haignereetal.,2015,2019].Inourexperimentspertaining to programming language comprehension, we use the activity seen in the auditory system as a negativecontrol. ForthePython program comprehension experiment, individual programs were modeled using the period from the onset of the code/sentence problem until the buttonpress. See Fedorenko et al. [2010] for a discussion of the functional localization approach as it pertains to the language network.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Portugal > Castelo Branco > Castelo Branco (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine (0.68)
Signature in Code Backdoor Detection, how far are we?
Le, Quoc Hung, Le-Cong, Thanh, Le, Bach, Xu, Bowen
As Large Language Models (LLMs) become increasingly integrated into software development workflows, they also become prime targets for adversarial attacks. Among these, backdoor attacks are a significant threat, allowing attackers to manipulate model outputs through hidden triggers embedded in training data. Detecting such backdoors remains a challenge, and one promising approach is the use of Spectral Signature defense methods that identify poisoned data by analyzing feature representations through eigenvectors. While some prior works have explored Spectral Signatures for backdoor detection in neural networks, recent studies suggest that these methods may not be optimally effective for code models. In this paper, we revisit the applicability of Spectral Signature-based defenses in the context of backdoor attacks on code models. We systematically evaluate their effectiveness under various attack scenarios and defense configurations, analyzing their strengths and limitations. We found that the widely used setting of Spectral Signature in code backdoor detection is often suboptimal. Hence, we explored the impact of different settings of the key factors. We discovered a new proxy metric that can more accurately estimate the actual performance of Spectral Signature without model retraining after the defense.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > North Carolina (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.88)
Localizing Malicious Outputs from CodeLLM
Borana, Mayukh, Liang, Junyi, Rajan, Sai Sathiesh, Chattopadhyay, Sudipta
We introduce FreqRank, a mutation-based defense to localize malicious components in LLM outputs and their corresponding backdoor triggers. FreqRank assumes that the malicious sub-string(s) consistently appear in outputs for triggered inputs and uses a frequency-based ranking system to identify them. Our ranking system then leverages this knowledge to localize the backdoor triggers present in the inputs. We create nine malicious models through fine-tuning or custom instructions for three downstream tasks, namely, code completion (CC), code generation (CG), and code summarization (CS), and show that they have an average attack success rate (ASR) of 86.6%. Furthermore, FreqRank's ranking system highlights the malicious outputs as one of the top five suggestions in 98% of cases. We also demonstrate that FreqRank's effectiveness scales as the number of mutants increases and show that FreqRank is capable of localizing the backdoor trigger effectively even with a limited number of triggered samples. Finally, we show that our approach is 35-50% more effective than other defense methods.
A Brain regions
A system of regions (also referred to as a network) can comprise multiple disjoint regions that exhibit shared activity patterns across a range of tasks. The auditory system is located in the superior temporal region of the brain. The voxels were then filtered using gray-matter masking and (for MD and the Language systems) network localization. See Fedorenko et al. [2010] for a discussion of the functional localization approach as it pertains to the language network. For each brain system and each code property or code model, we run a separate MVP A analysis.
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Portugal > Castelo Branco > Castelo Branco (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine (0.68)
A Code Comprehension Benchmark for Large Language Models for Code
Havare, Jayant, Chaudhary, Saurav, Ramakrishnan, Ganesh, Maharajan, Kaushik, Tamilselvam, Srikanth
Large Language Models have shown impressive capabilities in coding tasks like code generation and code completion, as they have been trained on a large amount of code data. Also, since one of the core pretraining objectives is Next Token Prediction, these models tends to learn surface-level syntactic patterns in code. However, this does not guarantee code comprehension ability i.e. the ability to capture the semantics of the code. In our opinion, this is the reason why these models often underperform on tasks that require deeper semantic understanding, such as code debugging and code optimization. To address this, we propose fine-tuning these models specifically for code comprehension tasks using large-scale datasets, enabling them to develop a more robust understanding of code semantics. We evaluate three code models of varying sizes on a suite of code comprehension tasks designed to assess semantic understanding beyond surface-level syntactic pattern matching. In particular, we analyze performance on the Subjectivity Grading Task and observe that model performance improves after fine-tuning on relevant downstream tasks. The most significant improvement is seen in the QWQ-32B model, where accuracy increases from 70% to 83.47%. A similar or explainable trend is observed across other models, clearly indicating an enhancement in code comprehension ability. Among the models studied, the DPO-fine-tuned Codestral-22B achieves the highest micro-accuracy of 87.66% on the Subjectivity Grading Task.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks
Khant, Kyi Shin, Lin, Hong Yi, Thongtanunam, Patanamon
Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training approaches may not fully optimize model performance, as they typically involve learning from randomly shuffled training data. Recent work shows that Curriculum Learning (CL) can improve performance on code-related tasks through incremental learning based on the difficulty of synthetic code. Yet, the effectiveness of CL with conventional difficulty measures in SE tasks remains largely unexplored. In this study, we explore two conventional code metrics: code length and cyclomatic complexity to determine the difficulty levels. We investigate how the pre-trained code model (CodeT5) learns under CL, through the tasks of code clone detection and code summarization. Our empirical study on the CodeXGLUE benchmark showed contrasting results to prior studies, where the model exhibited signs of catastrophic forgetting and shortcut learning. Surprisingly, model performance saturates after only the first quartile of training, potentially indicating a limit in the model's representation capacity and/or the task's inherent difficulty. Future work should further explore various CL strategies with different code models across a wider range of SE tasks for a more holistic understanding.
Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay
Chen, Yuyang, Zhao, Kaiyan, Wang, Yiming, Yang, Ming, Zhang, Jian, Niu, Xiaoguang
Nowadays transformer-based Large Language Models (LLM) for code generation tasks usually apply sampling and filtering pipelines. Due to the sparse reward problem in code generation tasks caused by one-token incorrectness, transformer-based models will sample redundant programs till they find a correct one, leading to low efficiency. To overcome the challenge, we incorporate Experience Replay (ER) in the fine-tuning phase, where codes and programs produced are stored and will be replayed to give the LLM agent a chance to learn from past experiences. Based on the spirit of ER, we introduce a novel approach called BTP pipeline which consists of three phases: beam search sampling, testing phase, and prioritized experience replay phase. The approach makes use of failed programs collected by code models and replays programs with high Possibility and Pass-rate Prioritized value (P2Value) from the replay buffer to improve efficiency. P2Value comprehensively considers the possibility of transformers' output and pass rate and can make use of the redundant resources caused by the problem that most programs collected by LLMs fail to pass any tests. We empirically apply our approach in several LLMs, demonstrating that it enhances their performance in code generation tasks and surpasses existing baselines.
- Asia > Macao (0.14)
- Asia > China > Hubei Province > Wuhan (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.46)
CoCoP: Enhancing Text Classification with LLM through Code Completion Prompt
Mohajeri, Mohammad Mahdi, Dousti, Mohammad Javad, Ahmadabadi, Majid Nili
Text classification is a fundamental task in natural language processing (NLP), and large language models (LLMs) have demonstrated their capability to perform this task across various domains. However, the performance of LLMs heavily depends on the quality of their input prompts. Recent studies have also shown that LLMs exhibit remarkable results in code-related tasks. To leverage the capabilities of LLMs in text classification, we propose the Code Completion Prompt (CoCoP) method, which transforms the text classification problem into a code completion task. CoCoP significantly improves text classification performance across diverse datasets by utilizing LLMs' code-completion capability. For instance, CoCoP enhances the accuracy of the SST2 dataset by more than 20%. Moreover, when CoCoP integrated with LLMs specifically designed for code-related tasks (code models), such as CodeLLaMA, this method demonstrates better or comparable performance to few-shot learning techniques while using only one-tenth of the model size. The source code of our proposed method will be available to the public upon the acceptance of the paper.